-
Notifications
You must be signed in to change notification settings - Fork 347
chore: panic on consensus failures #2547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| }) | ||
| } | ||
|
|
||
| // TestSwitchToConsensusVoteExtensions tests that the SwitchToConsensus correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this test because it's only for vote extensions and we're not using them and they provoke panics and expect them to be recovered, but now we re-panic to stop the node.
| // some console or secure RPC system, but for now, halting the chain upon | ||
| // unexpected consensus bugs sounds like the better option. | ||
| onExit(cs) | ||
| panic(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol panicking in a recover function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to re-propagate after we execute onExit() 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are just going to panic, then does it even make sense to recover? Will this cleanly shut down everything. Isn't it possible with the multiplexer that there would be some dangling threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'll check it more closely
|
I'm fine with merging this, but I also don't see this as a huge priority. I know the consensus reactor can't recover here, but do we know node operators actually want this? |
|
also, is this a break in behavior? |
|
It's changing the behavior, so it can be considered breaking. However, given @cmwaters has the most context, I am waiting for his review before merging |
cmwaters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure about simply panicking. If we are to use a panic, I just want to make sure it cleanly shuts down the node completely and not just the process that called the panic.
Yes this is a breaking change in behaviour but I think it's an important improvement in detecting problems in critical components. As it is breaking, we should probably target v7 with this (which is what I believe main targets)
| // some console or secure RPC system, but for now, halting the chain upon | ||
| // unexpected consensus bugs sounds like the better option. | ||
| onExit(cs) | ||
| panic(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are just going to panic, then does it even make sense to recover? Will this cleanly shut down everything. Isn't it possible with the multiplexer that there would be some dangling threads
current main targets v6. v0.39.x-celestia can be deleted. As long as we don't have any v7 changes |
Closes #2222